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ABSTRACT 

We consider Location-based Service (LBS) settings, where a LBS 
provider logs the requests sent by mobile device users over a period 
of time and later wants to publish/share these logs. Log sharing can 
be extremely valuable for advertising, data mining research and 
network management, but it poses a serious threat to the privacy 
of LBS users. Sender anonymity solutions prevent a malicious at- 
tacker from inferring the interests of LBS users by associating them 
with their service requests after gaining access to the anonymized 
logs. With the fast-increasing adoption of smartphones and the con- 
cern that historic user trajectories are becoming more accessible, it 
becomes necessary for any sender anonymity solution to protect 
against attackers that are trajectory-aware (i.e. have access to his- 
toric user trajectories) as well as policy-aware (i.e they know the 
log anonymization policy). We call such attackers TP-aware. 

This paper introduces a first privacy guarantee against TP-aware 
attackers, called TP-aware sender k-anonymity. It turns out that 
there are many possible TP-aware anonymizations for the same 
LBS log, each with a different utility to the consumer of the anonym- 
ized log. The problem of finding the optimal TP-aware anonymiza- 
tion is investigated. We show that trajectory-awareness renders the 
problem computationally harder than the trajectory-unaware vari- 
ants found in the literature (NP-complete in the size of the log, ver- 
sus PTIME). We describe a PTIME 1-approximation algorithm for 
trajectories of length I and empirically show that it scales to large 
LBS logs (up to 2 million users). 

1. INTRODUCTION 

A Location-based service (LBS)|7| is an information or enter- 
tainment service, accessible with mobile devices through the mo- 
bile network and utilizing the geographic location of the mobile de- 
vice (e.g. "find the nearest gas station", "Thai restaurant", "hospi- 
tal"). Recently, the availability and usage of Location-based service 
has increased significantly because the location of mobile devices 
can be computed automatically (without any input from the user) 
by the wireless network (via triangulation of mobile device signal) 
or by the mobile devices themselves (via the embedded GPS chip). 

Most of the popular Location-based services such as Facebook 
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Places |2J, FourSquare |j3J, Gowalla |5J and Loopt |6| log the LBS 
requests sent by their users. The data retention policies of these 
LBSs have provisions that describe this intent. An LBS request log 
is of great value to advertisers and researchers as it can be used to 
answer queries such as "find the requests sent by users that move 
from location A to location i?" or "requests sent by the same user 
over a period of time". But an LBS request log may also contain 
sensitive requests that the user wishes to keep private (e.g. for the 
local campaign headquarter of a given political party, spiritual cen- 
ter for a given religion, etc.). In the event that an attacker gains 
access to the LBS request log, the sender's privacy is at risk. 

In this paper we investigate how to anonymize the LBS request 
log so as to protect the identity of the LBS request senders even if 
the log falls in the hands of attackers who also gain access to the 
sequence of location-timestamp pairs (a.k.a. trajectory) visited by 
the mobile users for the duration the LBS requests are logged and b) 
the anonymization policy used to provide this protection. Assump- 
tion b) is based on a well-accepted principle of designing a private 
and secure system - "The design is not a secret" |27|. Assumption 
a) is a realization of the fact that an attacker can obtain the locations 
visited by the users from many sources, including the wireless ser- 
vice provider, or location computing servers such as Sky Hook fS), 
or user surveillance. A recent article in the Wall Street Journal |9] 
and a joint study 1 17 1 by Intel Labs, Penn State, and Duke Univer- 
sity provide evidence that advertisers are logging the trajectories 
of the mobile device users. The attacker may gain access to this 
information via hacking, financial agreement and subpoena. 

In the LBS context, the best-studied identity protection measure 
is known as sender k-anonymity |20||19|[24l|20[ |161, which is in- 
tended to guarantee that the content of an LBS request and the pre- 
cise location of the users are insufficient to distinguish among the 
actual sender and k-1 other possible senders. This guarantee is tar- 
geted towards the LBS request sent by the user at a given instant of 
time. The underlying model does not consider the sequence of LBS 
requests sent by the user over time. We refer to this privacy guar- 
antee as snapshot sender k-anonymity, and any solution enforcing 
it as snapshot k-anonymization. As shown below, snapshot sender 
k-anonymity protects only against attackers who are unaware of the 
user trajectories, i.e. treating requests at different instants as inde- 
pendent even if they actually originate from the same user. Typical 
snapshot anonymization algorithms |19[ |24[ |20[ [161 are based on 
hiding the sender's precise location I in the request, substituting 
instead a cloak, i.e. a region containing I. The cloak is usually 
chosen from among regions of a pre-defined shape (circular, rect- 
angular etc.), to include locations of at least k-1 other mobile users. 
We refer to the cloak selection policy as snapshot k-anonymous 
policy. We illustrate a snapshot 2-anonymous policy next. 

Example 1. Figure^and Figure^show the location of five 
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Figure 1: User locations at ti Figure 2: User locations at t2 
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Table 1 : Snapshot policy-aware 2-anonymous policies 

users at two instants ti and t2- Table^shows the cloak selection 
policies Pi and P2 that select cloaks from the quadrants of a static 
quad-tree based partitioning of the geographic space. Suppose at 
instant ti, user 1 sends an LBS request L\. Policy Pi anonymizes 
Li by substituting the location in the request with the cloak R2 
( shown in Figure [7|. Note that R2 includes the location of users 
1, 2, 3 and a request sent by any one them is anonymized by Pi 
using the same cloak R2. Thus when an attacker, who has access 
to the user locations at ti and policy Pi, observes the anonymized 
request with cloak R2, he cannot distinguish whether the sender is 
user 1 or 2 or 3. Thus Pi provides snapshot sender 2-anonymity. 

Suppose user 1 sends another LBS request L2 at instant t2- Pol- 
icy P2 anonymizes L2 by substituting the locations in the request 
with cloak R^ (shown in Figure^. An attacker who has access to 
user locations at t2 and policy P2 cannot distinguish the sender 
among users 1, 4 and 5 since requests from all these users are 
anonymized using R3. Thus P2 provides snapshot sender 2-anonymity. 
Note that policy P2 does not take into account the user locations 
at instant ti and their anonymizations using policy Pi ( and vice 
versa). □ 

A natural first candidate solution to anonymizing LBS request logs 
is to leverage previous work on snapshot k-anonymization (19[|24[ 
|20[ [T6| , anonymizing for each time instant t the snapshot of re- 
quests at t (independently of how snapshots at other instants are 
anonymized). Unfortunately snapshot-by-snapshot anonymization 
of a request log does not provide sender k-anonymity against an 
attacker who has access to the user trajectories for the period the 
requests are logged. The next example illustrates this point. 

Example 2. Recall the setting in Example^and assume that 
the LBS logs the user requests. To anonymize the log, the LBS uses 
policies Pi and P2. Moreover, user ids are replaced with meaning- 
less identifiers, however in order to preserve the linkage between 
requests sent from the same trajectory, the same identifier is used 
for all requests sent by the same user. 

Assume the anonymized request log is observed by an attacker 
who knows Pi and P2 and the user locations at ti and t2. As in 
Example [7] the attacker can use the knowledge of the policies to 
limit the first request's sender to one 0/ {1, 2, 3} and the second 
request's sender to one of {1, 4, 5}. Next he uses the additional 
knowledge that both requests where sent from the same user trajec- 
tory: he intersects the two sets of potential senders and concludes 
that user 1 must be the sender, breaching sender 2-anonymity! □ 

In Example |2] the attacker is able to breach sender 2-anonymity 
because the request log enables him to associate the two requests 
to the same trajectory and the snapshot 2-anonymization policy P2 



does not take into account the anonymization of request from user 
1 at instant ti using policy Pi . Thus the above breach could have 
been avoided if either the two requests were not linked with the 
same trajectory in the anonymized log, or if instead of policy P2 
we used a policy P2 that anonymizes requests from users 1, 2 and 
3 using the region P3 . While we are free to change the anonymiza- 
tion policy to preserve sender k-anonymity, we do not wish to en- 
tirely remove the association of requests with a trajectory in the 
anonymized LBS request log since this is valuable information for 
analytics. This poses an interesting challenge for the LBS: how can 
it publish an anonymized LBS request log that includes some form 
of linkage information between requests and trajectories, without 
jeopardizing k-anonymity of the users? 

The LBS needs to ensure anonymity against an attacker who 
knows the user trajectories for the period the LBS requests are 
logged (we call the attacker T-aware) and who knows the "design" 
i.e. the policy used to pick cloaks for anonymizing requests (P- 
aware attacker). We call this problem the offline TP-aware sender 
k-anonymity problem since an LBS request r can be anonymized 
taking into account all request in the log, including those sent after 
r. We contrast this with the problem of online TP-aware sender k- 
anonymity, in which (i) the LBS is not trusted, therefore an LBS 
request is anonymized before it is sent to the LBS, and (ii) the 
anonymization of an LBS request takes into account only the his- 
tory of requests so far and cannot be altered after observing subse- 
quent requests by the same user. We leave the problem of online 
rP-aware sender k-anonymity for future work. 

In this paper, we propose a solution to the offline TP-aware sender 
k-anonymity problem. It consists of publishing a sequence of cloaks 
to anonymize the sequence of LBS requests sent by a user over a 
period of time. With each cloak in the sequence we associate a set 
of LBS requests devoid of any location and sender identity infor- 
mation. The LBS requests associated with a sequence 5* of cloaks 
represents LBS requests by users whose trajectories pass through 
S. We thus preserve some association between LBS requests and 
the trajectories they were sent along (though we introduce some 
uncertainty as requests are not tied to a single trajectory, but rather 
to a "bundle" of trajectories compatible with S). To provide TP- 
aware sender k-anonymity, we choose S such that at least k distinct 
user trajectories are anonymized to S. 

The technical challenge we need to solve is to find, among the 
(exponentially) many possible ways of bundling user trajectories 
together, the one that results in the maximum utility for the con- 
sumer of the anonymized log. Intuitively, we are looking to mini- 
mize the cloak sizes, so as to improve the precision of the anonymized 
information. 

Our contributions include the following: 

[1] We identify and formulate the problem of offline sender k- 
anonymization of LBS request logs, which protects against the class 
of trajectory- and policy-aware attackers. We define a novel privacy 
guarantee, TP-aware sender k-anonymity. 

[2] We study the problem of finding, among all the offline poli- 
cies that provide TP-aware sender k-anonymity, one with the op- 
timum utility for the consumer of the anonymized LBS log. We 
show that finding the optimum offline policy that uses cloaks cho- 
sen among the quadrants of a quad-tree based partition of the map 
is NP-Complete. This is significant, showing that guarding against 
T-aware attackers is computationally harder than against T-unaware 
attackers: it was shown in fT6\ that for such cloak types optimum 
snapshot k-anonymization is in PTIME. 

[3] We show that optimum PP-aware sender k-anonymity is PTIME- 
approximable (i.e. one can always find, in polynomial time, an 
anonymization whose utility is within well defined bounds rela- 



live to the optimum utility). In particular, we describe a novel l- 
approximation algorithm to anonymize an LBS request log span- 
ning user trajectories of length I. 

[4] We implement and experimentally evaluate our anonymiza- 
tion algorithm and show that it is practical and scales well with 
the number of user trajectories: it takes less than 4 minutes to 
anonymize 2 million trajectories of length 30 for users moving 
around the San Francisco Bay area. This is a performant running 
time for an offline algorithm, especially since we show that alter- 
nate solutions are impractically slow and/or provide worse approx- 
imations. 

Paper outline. The remainder of the paper is organized as follows. 
In Section|2] we describe a prevailing model of an LBS. We define 
offline TP-aware sender k-anonymity in Section[3]and describe our 
solution that uses a sequence of cloaks to preserve TP-aware sender 
k-anonymity while publishing some linkage information between 
anonymized requests and bundle of user trajectories. In Section |4] 
we show that finding the optimum offline policy that provides TP- 
aware k-anonymity is NP-hard and hence in Section [4T| we propose 
a polynomial time approximation algorithm. In Section [4.1.3| we 
describe optimizations and our implementation of the optimized 
approximation algorithm. We report on the experimental evaluation 
in Section |5] discuss related work in Section |6] and conclude in 
Section|71 

2. LOCATION BASED SERVICES 

This section introduces a prevailing model of location-based ser- 
vices, based on automatic computation of the location of user mo- 
bile devices. It describes various entities in the LBS ecosystem, the 
data flow among these entities and the data logged by them. 

As shown in Figure|3]there are four core elements in the delivery 
of a location-based service: the user making a request, typically 
called the sender, the (wireless) Communication Service Provider, 
denoted as CSP, the location server that computes the location of 
the mobile device, denoted as LS, and the Location Based Service 
(LBS) provider, denoted as LBS. We view the CSP, Location Server 
and LBS provider to be trusted agents and assume that the commu- 
nication between them is secure. 

To access an LBS, a sender uses an application on the mobile 
device (typically provided by the LBS). The application fetches 
the location from the run-time environment on the mobile device, 
which in turn gets it from the location server. The location server 
is a specialized network component in CSP's network, known as 
Mobile Positioning Center (MPC) in the CDMA standard, that pro- 
vides access to device locations for E911 |1| and other location- 
based services. The location can also be obtained from a service 
that operates outside the CSPs network and can compute the lo- 
cation of a mobile device using the signal strength of nearby cell- 
towers and WiFi access points observed on the mobile device (e.g. 
SkyHook |8| and Google Location Service |4|). The application 
then sends a service request containing the location and the specifics 
of information/operation requested by the sender (e.g. "car dealer- 
ship in 5 mile radius from my location" or "notify me when a friend 
is within 1 mile from my location"). The LBS provider responds to 
the LBS request using the location sent with the request. 

Henceforth we abstract from these details and focus on the treat- 
ment of location data and the LBS service requests in the LBS 
ecosystem. For simplicity of presentation (and without loss of gen- 
erality), we model a geographic area as a 2-dimensional space and 
user's location as integer coordinates within this space. 

As a mobile user moves and sends LBS requests from different 
locations at different times, the LBS provider logs these requests. 
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Figure 3: LBS Model 

Each logged request is associated with the identifier of the device 
that sent the request (e.g. IP-address or MAC-address). This allows 
the LBS provider to identify the requests sent by the same user in 
the LBS request log and assemble a history of LBS requests for 
each user. We represent a user history of length I as a triple 

{uid, {l0Ci,l0C2, . . . loci), {Vl,V2, ■ ■ ■ Vt)) 

where uid is the userid of the user, loci is the location of the user 
at instant i and Vi the request sent by the user at instant i. (For 
presentation simplicity and without loss of generality, we abstract 
away the actual timestamps, representing them as natural numbers.) 
Each request is a set of name-value pairs containing the categories 
and specifics of the sought services (e.g. {{poi, rest),{cat, ital), 
(dist, 2mi)} represents "find Italian restaurants within 2 miles of 
my location"). An LBS request log consists of a set of user histo- 
ries. The snapshot at instant i contains for each user her location at 
instant i and the request (if any) it made at i. 

Example 3. Consider an LBS request log containing the his- 
tories Ua, Uh, Uc, Us, ut of users Alice, Boh, Carrol, Sam and Tom, 
whose devices have IDs ida, idi„idc,ids and idt, respectively: 

Ua = {ida, {{1,2), (1,3)), {V1,V2)) 
Ut^ (idt, {{1,2), (2,2)), {V3,_)) 

= (id,, ((2, 2), (2, 2)), {_,V4)) 

Us = {ids, {{2,1), {A, A)), {vs,_)) 

ut = (idt, {{3,3), {3,3)), {_,ve)) 
From this log we can read that at instant 1, the LBS logs the re- 
quests V\, v-j, and sent by the users Alice, Bob and Sam ( respec- 
tively). Alice was at that time at location (1,2), given as coor- 
dinates in the 2-dimensional map. At instant 2, the LBS logs the 
requests V2, V4 and ve sent by Alice, Caroll and Tom ( respectively). 
Alice had by now moved to location (1, 3). _ stands for "no re- 
quest" (Bob made no request at instant 2). □ 

The set of user history objects contains very useful data for re- 
searchers and advertisers since it can be used to answer queries 
such as "the requests sent by users that move from location A to 
location B". As mentioned in Section [T| previous work on sender 
anonymity in an LBS setting has focused on anonymizing snap- 
shots independently of each other |j20][T9]|24]|20][B). Example |2] 
showed that leveraging such work to user histories by anonymiz- 
ing them snapshot-by-snapshot leads to privacy breach. A more 
holistic anonymization is called for. 

3. A NOVEL PRIVACY GUARANTEE 

In this section we describe our approach for anonymizing the set 
of user histories to provide sender k-anonymity against the class of 
attackers who are aware of user trajectories and the anonymization 
algorithm, in the sense that the attacker cannot reduce the set of 
possible suspects to less than k. The anonymization preserves link- 
age information between LBS requests and trajectories to an extent 
that does not pose any risk to the sender k-anonymity of the users. 

Bundle The key concept we introduce to model an anonymized 
user history is called a "bundle". Intuitively, a bundle object corre- 
sponds to a set of user histories bundled together. It lists a sequence 
of cloaks such that the cloak at instant i contains the locations at 



instant i of all users in the bundle. The bundle also lists for each 
instant i the set of requests issued at instant i by the bundled users. 

More formally, a. bundle is a tuple {bid, {ri, . . . ,ri) , {si, . . . si)), 
where bid is a unique bundle identifier, ri , . . . , r; is a sequence of 
I cloaks, and si, . . . , s; is a sequence of I sets of LBS requests. A 
cloak is a 2-dimensional region (e.g. [(a;i, j/i), (2:2, 1/2)] for axis- 
parallel rectangles, where (xi, j/i) and (12, 2/2) are the coordinates 
of the lower-left and upper-right comers of the rectangle). Recall 
from Section[2]that a request is a set of name-value pairs devoid of 
any identifier or location information. 

Example 4. The following are examples of bundles that use 
the cloaks shown in Figure [7] and Figure [2] and the requests de- 
scribed in Example^ 

bl =(1, {R2,R3), {{V1,V3,V5},{V2,V4})) 

b2 =(2, (7?3,i?3), {{vi,V3,V5},{v2,V4,Ve})) 

b3=(3, {R2,Ra), {{«3},{W4})j 

bi = (4,{Rz,Ri),{{vi},{vi})) □ 



Definition 1 . /Masking/ Given a user history 
u = (uid, {loc\, I0C2, ■ ■ ■ loci), (Vi, V2, . • . V;)) and a bundle 
b — (bid, (ri, ■ • ■ , rj), (si • ■ • s;)), we say that b masks u if for 
each i G [1, . . . loCi G ri and Vi G Si. 

Example 5. Assuming unit length for the smallest squares in 
Figure^ bundle b\ in Example^masks the user histories Ua, ut, 
Uc and Us from Example ^ Similarly bundle &3 masks the user 
histories Ub and Uc- □ 

Instead of publishing the set of user histories, our proposal is that 
the LBS publish a set of bundles that masks the user history objects 
in its log. Note that within a bundle, the association of the LBS 
requests with the trajectory of the actual sender is obfuscated since 
for any two distinct instants i,j and any requests Vi, Vj the bundle 
only states that their sender locations belong to the cloaks ri , r j , 
but not whether they were actually sent by the same user. 

Insuring sender k-anonymity consists in choosing bundle objects 
such for each request in the bundle, an attacker cannot limit the set 
of potential senders to less than k. We formalize this next. 

Anonymization Policy We define an anonymization policy as a 
function P that, given a set of user history objects U, associates to 
each user history object u a bundle object b (denoted P{U, u) — b) 
such that b masks it. We sometimes write P{u) — b when the LBS 
log U is clear from the context. 

Example 6. The following anonymization policy P3 anonymizes 
the 5 user history objects in Example [5] using the bundle objects 
shown in Example^ Ps,{ua) = 61, P3{ub) = 63, P3,(uc) — bg, 
Psius) = bl, Piiut) ^ b2. a 

3.1 TP-aware Sender k-Anonymity 

We next define our novel privacy guarantee. To do so we need to 
formalize the class of attackers who are aware of user trajectories 
and the anonymization policy. 

Attacker Model We target a strong information-theoretic def- 
inition of privacy therefore we model the attacker as a function 
taking certain input to launch the attack, with no bounds on the 
computational resources expended during the attack. The only as- 
sumptions are on what input the function takes (intuitively, the in- 
formation that the attacker sees). The input comprises: 

• the anonymity degree k; 

• the specific anonymization policy P used by the LBS; 



• the trajectory of all the users (the first and second compo- 
nents of each user history triple); 

• the published bundles. 

We refer to the class of such attackers as Trajectory-aware and 
Policy-aware (in short, TP-aware) attackers. The attack function 
models the following attack: starting from the observation of a set 
B of bundle objects, the knowledge of trajectories of all the users 
and the anonymization policy P, the attacker reverse engineers P 
to obtain the possible user histories that are anonymized by P to 
bundles in B. 

We are now ready to define TP-aware sender k-anonymity. In- 
tuitively, we consider it a breach of sender k-anonymity if for any 
bundle b the attacker succeeds in reducing the number of candi- 
date user histories that can possibly be anonymized to b, to less 
than k. Therefore, our privacy guarantee ensures that for each ob- 
served bundle object b there are at least k user histories that are 
anonymized to b under the chosen policy. 

Definition 2. /TP-aware Sender k-anonymity/ Le?Pte aw 
anonymization policy and U be a set of user histories. Let B be the 
set of bundles obtained using the policy P. We say that B pro- 
vides TP-aware sender k-anonymity /or U if for each bundle b £ B 
there are at least k distinct user histories in U that are anonymized 
to b imder P. We say that policy P provides TP-aware sender k- 
anonymity, if for every set of user histories U, the set of bundles 
{P{u)\u G 17} provides TP-aware sender k-anonymity to U. 

Note that even with unlimited computational resources, the best 
the attacker can hope to achieve is to exactly reverse engineer the 
inverse image of the published bundles under P (as opposed to just 
approximating it). Since P is defined such that the inverse image 
contains at least k possible user histories for each bundle, even the 
exact inversion of P does not breach privacy. 

We next show a policy that breaks TP-aware sender 2-anonymity. 

Example 7. Policy P3 of Example |6| anonymizes the set of 
user histories {ua, ut, Uc, Ug, ut} shown in Example^to the set 
of bundles {61, 62, ba} shown in Example^ When the TP-aware 
attacker observes bi, he tries to reverse engineer the user-history 
objects that could have anonymized to it. He finds two candidates, 
Ua and Us, corresponding to users Alice and Sam. Similarly for 
63, there are 2 users Ub and Uc that could be anonymized to 63. 
In contrast, when the attacker observes 62, there is only one user 
history Ut that is anonymized to &2 under P3. Thus Pi does not 
provide TP-aware sender 2-anonymity. 

We next illustrate a policy that does provide TP-aware sender 
2-anonymity. 

Example 8. For the five user histories in Example^ we de- 
scribe the following anonymization policy P4 that uses the bundles 
shown in Example^ Piiua) = 62, P4{ub) = 63, P4(mc) = bi, 
P4,{us) = 62, Pilut) — &2. There are at least 2 user histories 
anonymized by P4 to each one of the published bundles, 62 and &3. 
When the attacker observes the published bundles, he tries to re- 
verse engineer P4, but finds at least 2 users for each 0/62 and 63. 
Hence P2 provides TP-aware sender 2-anonymity. □ 

4. OPTIMUM-COST ANONYMIZATION 

For the same set of user-history objects there may exist several 
anonymization policies that provide TP-aware sender k-anonymity, 
raising the obvious question of which one to use. In this section we 
address the problem of finding the k-anonymous policy of highest 



utility to the consumers of the pubhshed log. Prior work |16[ |20[ 
|19[ |24| on snapshot sender k-anonymity proposes that one way to 
maximize utility is to minimize the area of the cloaks. For the log 
of LBS requests, an analogous measure would be to minimize the 
sum of the cloak areas used in the bundles. 

Cost of a bundle. We introduce the cost of a bundle to quan- 
titatively measure utility (maximum utility means minimum cost). 
Given abundle b = {bid, (ri • ■ • n), (si ■ • ■ s;)), we define the cost 
of bundle b as the sum of the areas of the cloaks in its cloak se- 
quence: Cost{b) = X]i=i CL'rea{ri). Given a collection U of user 
objects and an anonymizing policy P, we define the cost of P for 
anonymizing U as C'ost{P, U) — X^ueu Cost{P{u)). 

Optimum policy We next focus on the problem of finding the 
optimum (minimum cost) policy that provides TP-aware sender k- 
anonymity to a given set of user-history objects. 

Notice that the TP-aware sender k-anonymity guarantee is at 
least as computationally hard to enforce as its P-aware snapshot 
(T-unaware) version (the latter is a special case of the former for 
trajectory length 1). It is therefore natural to avoid settings in which 
snapshot anonymization is already intractable. For P-aware snap- 
shot k-anonymity, it was shown in [16] that the complexity of find- 
ing the optimum policy depends crucially upon the type of cloaks 
used for anonymization. For instance, finding the optimum policy 
among all the policies that use circular cloaks is NP-hard (in the 
number of users) 1 16 |, even if the cloak centers can be chosen only 
from a given set of points (e.g. public landmarks such as libraries, 
train stations or cell towers) and the only choice is on the length 
of the cloak radius. In contrast, one can find an optimum snapshot 
policy in polynomial time if the cloaks are chosen among the quad- 
rants of a quad tree 1 16|. These results suggest that, for a chance 
at practically feasible anonymization, our investigation would best 
focus on quad-tree based cloaks. This is indeed the case as shown 
in the next theorem. 

Anonymization using Circular cloaks. Let U he a set of user- 
history objects and SC be a set of points in the 2-dimensional space 
that contains the trajectories of the users. We define circular cloak 
sequence as a sequence of cloaks where each cloak is centered at 
some point from SC, with no restriction on the radius. Let V be 
the set of all those policies that use circular cloak sequence in the 
bundles for anonymizing user-history objects. The problem of Op- 
timum Offline TP-aware k-anonymization with Circular cloaks is to 
find a policy in V that minimizes the cost of anonymizing U . 

(Extended Version) Theorem 4. Optimum Offline TP-aware k- 
anonymization with Circular cloaks is NP-hard. 

The quad tree is a well-known structure for organizing spatial 
data, and it has been used in a number of anonymization solu- 
tions |I9[ |24[ |16( for snapshot sender k-anonymity. More impor- 
tant, this is the only cloak class for which optimum P-aware snap- 
shot k-anonymization is known to be PTIME-computable 1 16|. 

Anonymization using Quad-cloaks For the remainder of the 
paper, we consider policies that use cloaks picked from among the 
quadrants of a quad-tree partitioning of the geographic region. The 
root node of the quad-tree represents the entire region (assumed 
square shaped, without loss of generality) which is then partitioned 
into 4 equal square quadrants, each of whom represent a child node 
of the root. Each quadrant is then again divided into 4 equal sub- 
quadrants that correspond to grandchildren of the root. This four- 
way splitting goes on recursively until the desired level of granu- 
larity for the minimum region is reached. Figure|4]shows a part of 
a quad-tree based partitioning: region represents a quadrant in 
the quad-tree that is divided into 4 equal sub-quadrants (e.g. _R4). 
The sub-quadrant i?4 is further divided into Ro, Ri, R2, and -R3. 



Given a quad-tree representation Q of a region, we refer to a 
sequence of cloaks, where each cloak is one of the quadrants of 
Q, as a quad-cloak sequence. For instance, (RojR^) is a quad- 
cloak sequence of length 2 that uses the quadrants of the quad- 
tree in Figure [4] A policy that anonymizes user histories using 
bundles with quad-cloak sequence is referred to as a quad-cloak 
policy (since we consider only such policies for the remainder of 
the paper, we will drop the qualifier whenever convenient). 

Optimum quad-cloak policy. Given a quad-tree Q and a set 
U of user histories, there exist several quad-cloak policies that can 
be used to anonymize U. We show that their number is exponen- 
tial. Assume that U comprises n user histories, each of length /, 
and the quad-tree Q is of height h. For any location in the trajec- 
tory of a user history, there are h cloaks in Q that mask it (all the 
cloaks from leaf to root in Q). Therefore, for a trajectory of length 
I, there are h' different quad-cloak sequences masking it. There are 
hence /i'"'^ different ways of anonymizing the n histories in U us- 
ing quad-cloak sequences from Q (although not all of them provide 
TP-aware sender k-anonymity). 

The problem of optimum offline TV-aware sender k-anonymity 
with quad-cloaks is to find, given LBS log U , a quad-cloak bun- 
dle B that has the minimum cost of anonymizing U . Clearly a 
brute-force search among all /i'"'' quad-tree policies would take 
exponential time in the log size. As shown by our next result one 
cannot hope for PTIME (unless P = NP). 

Theorem 1. Optimum offline TV-aware sender k-anonymity 
with quad-cloaks is NP -complete { in the size of the LBS log). 

The significance of this result is that it shows that providing op- 
timum TP-aware sender k-anonymity is strictly a harder problem 
than the optimum snapshot k-anonymity studied in prior work. 

4.1 Approximation Algorithm 

The next best thing in lieu of a polynomial-time optimum so- 
lution is to find a polynomial time approximation algorithm with 
bounded approximation factor. We show such an algorithm next. 

At high level, we proceed as follows. We restrict the choices of 
cloak sequences that a policy can use, to a subset of all the possi- 
ble choices of quad-cloak sequences. This amounts to identifying a 
subset 5" of the set 5* of all possible quad-cloak policies. The sub- 
set 5*' is chosen such that an optimum quad-cloak policy relative to 
S' can be found in polynomial time, and that this policy's cost is 
within a bounded factor of the optimum quad-cloak policy in S. 

Our algorithm utilizes a structural relationship that exists be- 
tween quad-cloak sequences of a given length /. 

1-step Generalization and Generalization Graph Let Q be a 
quad- tree and s be a quad-cloak sequence of length I that uses quad- 
rants of Q. Let s' be a quad-cloak sequence obtained by replacing 
one of the cloaks in s with its parent in Q. We refer to s' as 1-step 
generalization of s. The 1-step generalization relation induces a 
directed acyclic graph over all the quad-cloak sequences of a given 
length I obtained using a quad-tree Q. We refer to this graph as 
the Generalization Graph (G-graph for short). Figure[6]shows part 
of the G-graph induced by 1-step generalization on the quad-cloak 
sequences of length 2 that use quadrants of the quad-tree shown in 
Figure |4] 

In a G-graph of length I it is easy to observe that a trajectory 
of length I masked by a quad-cloak sequence s is also masked by 
the 1-step generalization of s. We refer to this property as the con- 
tainment property. As an example, consider the trajectory of the 
user a shown in Figure [5] This trajectory is masked not only by 
the quad-cloak sequence(iio, -R3) shown in Figure [6] but also by 



the sequences corresponding to its ancestors in the G-graph (e.g. 
{Ro,R4) and {R4,R4)). 

Our approach to finding a subset of quad-cloak policies reduces 
to finding a tree-shaped subgraph of the G-graph: Given a G-graph 
G of length I and an LBS log U, let Vg be the set of all the policies 
that use quad-cloak sequences from G. The problem of optimum 
offline TP-aware sender k-anonymity with quad-cloaks is to find 
the optimum policy in Pq- Since this is A'^P-hard, we identify a 
subspace T of the G, and find the optimum policy in the set Vt of 
all policies that use cloak sequences from T. The choice of T is 
such that the optimum policy can be found in PTIME and its cost 
is a bounded approximation of the optimum policy in Pq ■ 

G-tree of a G-graph Given a G-graph G, we define a Gener- 
alization tree (G-tree) of G as a tree T in which every node has 
bounded degree and that preserves the ancestor-descendant rela- 
tionship of G. Formally: a) The nodes of T are a subset of the 
nodes in G. b) If y is the parent of x in T, then y must be a ances- 
tor of X in G. c) Each node in T has a finite bounded degree / (i.e. 
each non-leaf node has at most / children). 

Conditions a) and b) ensure that the set Vt of all the policies 
that use the cloak sequences in T is a subset of all the quad-cloak 
policies Vg- In addition, property b) above also preserves the G- 
graph containment in the corresponding G-tree. As a result any 
trajectory masked by a node in T is also masked by its parent in T. 
As described next, this property along with condition c) is key in 
finding a polynomial time approximation solution. 

Note that using the above definition, one can obtain multiple G- 
trees corresponding to a G-graph. The choice of a G-tree dictates 
the bounded approximation factor and the complexity of the algo- 
rithm that achieves the bound. We address the issue of identifying 
a G-tree with bounded approximation factor in Section [4. 1.2[ We 
first describe in Section p. l.l| a generic algorithm that takes as input 
a G-tree T and finds in PTIME the optimum policy w.r.t. to Vt- 

4.1.1 Min-Cost Policy in Any G-tree 

The algorithm exploits two unique properties of the policies in 
Vt- Using the first property we define equivalence classes of poli- 
cies such that all equivalent policies have the same cost and anon- 
ymize the same number of trajectories to each quad-cloak sequence 
in T. This allows us, instead of exploring a search space of policies, 
to explore a smaller search space of policy equivalence classes. 
Even though there are fewer equivalence classes than policies, the 
total number of choices is still exponential (in the number of cloak 
sequences in T). The second property allows us to use a divide 
and conquer strategy to prune and search in polynomial time the 
exponential search space of equivalence classes and find the one 
corresponding to the optimum policy. 

Property 1: Cost of a policy is determined by the number of 
trajectories anonymized to each node in T. For a policy in 
Vt, the property of being TP-aware sender k-anonymous and the 
cost of the anonymization depends upon how many trajectories are 
anonymized by each node n in T, being indifferent to which par- 
ticular trajectories are anonymized to n. 

Example 9. Consider trajectories a and h shown in Figure^^ 
Let Pi and P2 be two anonymization policies that anonymize tra- 
jectories a and h as shown in Figure^and Figure^ Pi anonymizes 
a to cloak sequence (Rq, R4) and b to {R4, R4), whereas P2 anon- 
ymizes a to (7?4,/?4) and h to {Ro,R4)- Except for this differ- 
ence, all the other trajectories are anonymized identically in Pi 
andP2. Since Cost{Pi{a)) = Cost{P2{b)) and Cost{Pi{b)) = 
Cost{P2{a)), we have Cost{Pi{a))+Cost{Pi{b)) = Cost{P2{b)) + 
Cost{P2{a)) and the costs of Pi and P2 are identical. □ 





Figure 7: Policy Pi 



Figure 8: Policy P2 



We formalize this observation as an equivalence relation among 
policies in Vt that use quad-cloak sequences in G-tree T. Two 
policies in Vt are equivalent for a given set of trajectories if every 
node in G-tree T anonymizes the same number of trajectories under 
both policies. 

Lemma 1. If policies Pi,P2 are equivalent for a G-tree T, 
then (a) Pi and P2 have the same cost; and (b) Pi provides TP- 
aware sender k-anonymity on T if and only if so does P2. 

We exploit equivalence to replace the search space of policies 
in Vt with the smaller space of equivalence classes. We represent 
equivalence classes using a Configuration function. 

Configuration The function Configuration is defined to keep 
track of the number of trajectories anonymized by each node m in 
a G-tree T. For technical convenience, this is done by equivalently 
tracking for each node m the number of trajectories that are masked 
by m yet are not anonymized using m or any of its descendants. 
We refer to these trajectories as passed up (the responsibility of 
anonymizing them is passed up to m's ancestors). 

Definition 3. [ConRgaration] Let U be a set of trajectories 
and T be a G-tree rooted at r. Let d(m) denote the total number of 
user trajectories that are masked by the cloak sequence represented 
by node m. A Configuration G is a function from nodes of T to 
natural numbers, such that (i) for every leaf node m, C{m) < 
d{m); and (ii)for every internal node q, C{m) < X]f=i C{mi), 
where m has f children mi , . . ., m/. We say that C is complete ;/ 
G(r) = 0. 

Condition (i) in the above Definition [3] restricts a configuration 
to represent only masking policies and (ii) represents the fact that a 
trajectory can be anonymized to only one cloak sequence. Note that 
by Lemma[TJa), all policies in the equivalence class represented by 
a configuration G have the same cost. We call this the cost Costa 
of the configuration G. We can compute this cost directly using 
the configuration and without enumerating any policy: it is easy to 
compute the number a{m) of trajectories anonymized by node m 
of T, as the difference between the number of trajectories passed 
up by m's children and the number of trajectories passed up by 
m itself; Costc simply multiplies a(m) with the area of the cloak 
sequence represented by m, summing up over all m £ T. 

Definition 4. [Configuration cost] Let U be a set of trajec- 
tories and C be a configuration of the G-tree T. We define the cost 
of G for U, denoted Costc{G, U), as 



CostciC, U) 
where f(rn, G) is given by 



f{m,C) X Cost{m) 

m^nodes{T) 



f{m,C) = 
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where mi . . .nii are the children of m and Cost(m) is the sum of 
areas of the cloaks in the sequence corresponding to node m. □ 

We can show that the configuration cost is precisely the cost of 
the represented policies: 

(Extended Version) Lemma 4. Given a set U of n trajecto- 
ries of length I, a G-tree T of quad-cloak sequences of length I, a 
policy P that usees cloak sequences from T and a configuration 
C representing P's equivalence class, we have Costc{C,U) — 
Cost{P, U). 

Thus finding the optimum quad-tree policy that uses cloak se- 
quences from T to anonymize a set U of trajectories is equivalent 
to finding the optimum configuration C of the tree T w.r.t. U. Our 
algorithm does exactly that, i.e. we first find the minimum cost 
configuration and then materialize a policy corresponding to it. 

Checking Sender Anonymity from Configurations Since the 
algorithm manipulates configurations instead of policies, we need 
a check that a configuration corresponds to TP-aware sender k- 
anonymous policies. By Lemma[TJb), either all represented poli- 
cies qualify, or none does. It turns out that it suffices to check di- 
rectly that the configuration satisfies a property we call k-summing. 

Definition 5. /^k-summing/ Let U be a set of trajectories 
and C a configuration of the tree T rooted at r. C is a k-summing 
configuration if 

• for a leaf node m 

(i) ifd(m) < k, then C(m) = d{m). 

(ii) ifd{m) > k, then either C (m) — d{m) or 
C{m) < {d{m) - k). 

• for an internal node m let A = X]i=i C{mi), 
where mi . . .rnj are the children ofm in T 

(Hi) if A < k, then C{m) = A. 

(iv) if A > k, then either C{m) = AorC{m) < (A-fc). 

Intuitively, in Definition[5] clause (i) states that if the quad-cloak 
sequence corresponding to node m masks less than k trajectories. 



none of them can be anonymized by m lest k-anonymity be com- 
promised. The responsibility of anonymizing all d{m) of them is 
passed up to m's ancestors (C(m) = d{m)). By clause (ii), if 
there are at least k trajectories, then either all of them are passed 
up, or at most d(m) — k (since at least k must be anonymized to 
the same cloak sequence to preserve k-anonymity). For an internal 
node m, A represents the number of trajectories whose anonymiza- 
tion responsibility is passed up from m's children to m. If there are 
too few of them (less than k) then they cannot be anonymized us- 
ing the cloak sequence of m, who in turn passes the responsibility 
to its ancestors (in clause (iii)). Otherwise, m has the choice of 
either anonymizing none of them (C(m) = A in clause (iv)), or 
anonymizing at least k and passing up at most A — k. 

Lemma 2. Let T he a G-tree of quad-cloak sequences and U 
he a set of trajectories. Let C he a configuration of T for anonymiz- 
ing U, and P be a policy in the equivalence class C represents. 
P provides TP-aware k-anonymity to U if and only if C is a k- 
summing configuration. 

Lemma [2] justifies an algorithm that explores the space of k- 
summing configurations, in search for a complete minimum-cost 
configuration. But for a set of n trajectories and a G-tree T with m 
nodes there are 0{n"^) possible configurations. Next we describe 
the second property of the policies in Vt that enables a divide-and- 
conquer approach to find the optimum k-summing configuration. 

Property 2: Optimum cost of anonymizing a subset of tra- 
jectories using a node in T can be computed locally Let C be a 
k-summing configuration C of a G-tree T of quad-cloak sequences. 
For a node m in T, C{m) represents the number of unanonymized 
trajectories passed up by m. These trajectories are anonymized at 
one of the ancestors of m and hence they do not affect how the 
d(m) — C(m) trajectories are anonymized using m and its de- 
scendants. Thus for a given value of C(m), one can optimize the 
anonymization of d(m) — C{m) trajectories using m and its de- 
scendants independently of the rest of the trajectories and the rest 
of T. Before we describe how we compute this local optimum for 
each m, we need to point out that at this stage we don't know the 
value of C(m) in the optimum configuration. For this reason we 
compute the optimum costs of passing up 0, 1 . . . d(m) trajectories 
at m i.e. all possible values of C(m). For each pair (m, u) such 



that C(m) — It, the minimum cost is computed among all pos- 
sible configurations of the subtree rooted at m (as there are many 
possible configurations with C(m) = u). 

Computing all local optimum costs To compute the (local) op- 
timum value of passing up u trajectories at node m, the algorithm 
considers all possible counts (0, 1 . . . d(mi)), (0, 1 . . . d(m2)), . . ., 
(0, 1, ... , d{mf)) of trajectories passed by m's children mi, . . . ,mf 
respectively. Then it recursively computes the corresponding min- 
imum cost for each {mi,Ui) pair. Redundant cost re-computation 
for m, u pairs is avoided by a memoization technique: i.e., by stor- 
ing the result in the corresponding cell of a bi-dimensional matrix 
M indexed by the nodes of T and values of u. To enable the easy 
retrieval of the min-cost configuration from M, the entries for node 
m carry, besides the minimum cost, some bookkeeping information 
relating to the configurations of the children of m. 

This yields the following dynamic programming algorithm Traj- 
anon that, given a set U of trajectories of length I and a G-tree T 
with cloak sequences of length I, fills in a configuration matrix M 
of dimension |rj x \U\, where \T\ represents the number of nodes 
in T and \ U\ the number of trajectories in U . Each entry M[m][u] 
in the matrix is a tuple of the form {x,ui,U2, ■ ■ ■ ,uf), pertaining 
to a configuration C such that C (m) = u, and where x is the min- 
imum cost of passing up u trajectories, provided that the children 
mi , m2 , ... ,mf of m pass up iti , U2 it/ trajectories respec- 
tively. The algorithm traverses the tree T bottom-up starting from 
the leaf nodes, and for each node and < u < d{m) fills in the 
entry M[m] [u] using the rows from child nodes mi , m2 , . . .lUf. 

Algoritlim 1 Traj-anon 
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for 1 < m < |r| do 

for 1 < u < \U\ do 

M[m][u] := (oo, 0, 0, 0, 0) {initialize) 

end for 
end for 

for all node m € T do 

if (m is a leaf node) and (d(m) < k) then 

M[m][d(m)] := (0,0,0,0,0) 
else if (m is a leaf node) and (d(m) > k) then 
M[m][d(m)] := (0,0,0,0,0) 
for < It < d(m) — fc do 

M[m][u] := {area{m) x (d(m) - it), 0, 0, 0, 0) 
end for 
else {m is a non-leaf node} 

let mi, m2, . . . ,mf are children of m 
for all u in F(m) do 

pick Ml G F(mi), U2 G F{m2), . . .,Uf G F{mf) 
that minimize the quantity 

X ~ Y.Li M^[mi][ui] + {area{m) x {{J2Li "0 " 

u)) 

where 

F{m) denotes the set [0..(d(m) - fc)] U {d(m)}, 
and [i] [j] returns the first component of the 
tuple at M[i][j] 

M[m][u] := (3::,'Ui, U2, ■ ..,Uf) 
end for 
end if 
end for 
return M 



mediately compromise k-anonymity. Quantity x is the minimum 
cost among all configurations C with C(m) = u and which sat- 
isfies k-summation property. This is computed from the costs of 
the configurations at the / children, and the number of trajecto- 
ries anonymized by m (((X]/=i ^i) ~ ^)) multiplied by area(m) 
(which denotes the sum of the cloak areas in the sequence repre- 
sented by m). Recall that the cost is the first component of the 
tuple stored in the matrix entry, whence the need for the projection 
operation AI^. 

Notice how the algorithm mirrors Definition |5] to ensure that 
only k-summing configurations are considered. By Lemma|2] these 
configurations represent only TP-aware sender k-anonymous poli- 
cies. For instance, line [8] corresponds to case (i) in Definition [5] 
which prescribes that no trajectories are to be anonymized by m 
(all d(m) trajectories inside the cloak sequence of m are passed 
up, C(m) = d(m)). Thus by definition of Costc, the result- 
ing cost is 0, which is what line [S] fills into the first component 



of A/[m] [ii(m)]. Similarly, line 10 gives the cost corresponding 



Function F{m) in line 



16 



) limits the number of trajectories whose 
anonymization responsibility can be passed up by m. Notice that it 
rules out the values d(m) — fc + 1 through d(m) — 1 since these 
imply anonymizing less than fc trajectories at m, which would im- 



to the case in the first disjunct of line (ii) of Definition |5] line|12| 
corresponds to the second disjunct. We can formally prove that: 

Lemma 3. Algorithm Traj-anon computes in each M[m][u] 
= {x,ui,U2, . . . ,Uf) the minimum cost x among all k-summing 
configurations C such that C(m) — u and C{mi) — Ui, with 
mi, . . . ,mf the children ofm. 

Selecting the optimum configuration The information in M 
suffices to retrieve in PTIME a minimum-cost configuration. The 
optimum configuration is obtained when the optimum cost of C(r) = 
is computed, where r is the root node of T (the root cannot 
pass up any trajectories as there is no larger cloak sequence to 
anonymize them with). After that it is easy to retrieve the complete 
configuration from M in polynomial time by a top-down traversal 
of T. The minimum cost entry M[r] [0] for root r lists for its each 
child nii the value C{mi) = Ui leading to the minimum cost. Now 
inspect for each rrii the Ui entry in M, picking again the minimum 
cost entry for passing up Ui trajectories at rrii and continue recur- 
sively until all leaf nodes are reached. 

Complexity analysis The running time of Algorithm Traj-anon 
is dominated by steps | 16118] For internal node m, it ranges each of 
u,ui,U2, ■ ■ . ,Uf over at most \ U\ values (since F{m) < d(m) < 
\U\ for every m), resulting in Odfy]-^^^) iterations where the de- 
gree / represents the maximum number of children of a node m £ 
T. Summing up over all nodes m of the tree subspace T, we obtain 
the complexity of Traj-anon in (0\T\\U\^'^^). The exponent / of 
the polynomial depends upon the chosen G-tree T of the G-graph. 

Policy from Configuration We do not enumerate all the poli- 
cies of the equivalence class corresponding to the optimum config- 
uration. Note that a configuration C is exponentially more suc- 
cinct than an explicit listing of the policies it represents; if we 
focus on any node m alone, there are exponentially many ways 
to pick C(m) trajectories among those occurring in m. Yet, we 
can obtain one of the policies C represents in linear time by non- 
deterministically selecting the C(m) trajectories for each node m. 

4.1.2 Choosing the G-tree for l-Approximation 

Our approach for finding an approximation solution to the prob- 
lem of optimum TP-wzie sender k-anonymity using quad-tree pol- 
icy consists of a) identifying a subset S' of all the possible quad- 
cloak sequences and b) finding the optimum policy among those 
policies that only uses the cloak sequences from S' . 

In Section [4. 1.1 1 we described an algorithm Traj-anon that can 
find the optimum policy w.r.t. any G-tree of the G-graph G of quad- 
cloak sequences. In this section we show how to choose a G-tree Tu 



such that the optimum pohcy w.r.t. is a bounded approximation 
of the overall optimum policy w.r.t. G. Tu is obtained by limiting 
the choice of cloak sequences to uniform cloak sequences. 

Uniform Cloak-Sequence Tree Let D be a G-graph of quad- 
cloak sequences of length / that use quadrants of a quad- tree Q. 
Consider a quad-cloak sequence in D in which all cloaks have the 
same size. We call such a cloak sequence uniform quad-cloak se- 
quence. The cloak sequences {Ro,Rft), (RojRs) and {RajRa), 
shown in Figure|6] are examples of uniform quad-cloak sequences. 
Let s be a uniform quad-cloak sequence in D. Let Sp be the cloak 
sequence obtained by replacing each cloak in s with its parent in 
Q. We refer to Sp as the total 1-step generalization of s. The sub- 
set of uniform quad-cloak sequences from D and the total 1-step 
generalization function define a tree Tu as follows: 

• each uniform quad-cloak sequence in D is a node in r„. 

• If Sp is the total 1-step generalization of s G r„, then Sp G 
Tu and we set Sp as the parent of s. 

It is easy to check that Tu is a G-tree of G-graph D. 

We refer to this tree as the Uniform Cloak-Sequence Tree (U- 
Tree) since it includes only and all the uniform quad-cloak se- 
quences of the G-graph. The root of the U-tree is the sequence 
of quad-cloaks corresponding to the root of the quad tree Q. The 
leaf nodes are the uniform cloak sequences where each cloak is 
a quad-cloak corresponding to a leaf node of Q. The intermedi- 
ate nodes are uniform quad-cloak sequences where each cloak is a 
quad-cloak corresponding to a non-leaf node of Q. The height of 
the U-tree is the same as that of Q i.e. h, since for each leaf uni- 
form cloak sequence h — 1 successive total 1-step generalizations 
lead to the root uniform cloak sequence. 

Example 10. For the G-graph G shown in Figure^ the U- 
tree Tu consists of the nodes on the bottom level (e.g. {Ro, Ro)) 
and the root {Ra, Ri) ofG (missing G's second level). The edges 
in Tu connect all bottom level nodes to the root. Notice that the par- 
ent node is obtained using total 1-step generalization of the child 
nodes in Tu. Also the parent {R4, Ra) and its child nodes are in 
an ancestor-descendant relationship in G. Since the length of the 
cloak sequences is I = 2, each node in Tu has 4^ children. □ 

Uniform policy A policy that only uses uniform quad-cloak se- 
quences in the bundles is referred to as uniform quad-cloak policy. 
Note that in a uniform quad-cloak policy, the cloaks are of the same 
size within a bundle, but not necessarily across bundles. 

Theorem 2. Given a set U of trajectories of length I, a quad- 
tree Q and degree of anonymity k, the cost of the optimum uniform 
policy that provides TV-aware sender k-anonymity is at most I times 
that of the overall optimum policy that provides TV-aware sender 
k-anonymity. 

To obtain the optimum uniform policy that anonymizes a set U of 
trajectories of length I, we call the Traj-anon algorithm with U-tree 
Tu of length I as input. For a U-tree Tu of length I, each non-leaf 
node has 4* child nodes. Substituting this value for / in algorithm 
Traj-anon, each entry M[m][u] needs to store 4' optimum costs 
corresponding to 4' child nodes and the number of iterations (Step 
[l?! needed to compute a matrix entry are bounded by 0(|t/|'* +^). 
The obtained configuration represents the equivalence class of poli- 
cies that have the optimum cost among all the uniform policies, for 
anonymizing U . By Lemma[3]and Theorem|2] we have: 

Theorem 3. When taking the U-tree as input, Algorithm Traj- 
anon computes an l-approximation solution to the problem of opti- 
mum offline TP-aware sender k-anonymity. 



As described earlier the complexity of Algorithm Traj-anon de- 
pends upon the maximum degree of a node in the input G-tree. 
When the input is the U-tree, Traj-anon runs in (OlTuI If/I* "^^). 
Clearly, the exponent 4' + 1 is impractically high as we expect a 
large number of trajectories. In the next section we describe our 
optimization techniques to reduce the complexity of Traj-anon to 
low-degree PTIME. 

4.1.3 Optimizations 

In this section we describe optimizations to reduce the complex- 
ity of the Traj-anon algorithm without degrading the approximation 
factor. Due to space limitations, we sketch the high-level ideas, rel- 
egating details to Section [4!2l 

Recall that the exponent in the complexity of the Traj-anon algo- 
rithm is determined by the degree (branching factor) / of the input 
tree, and that in the case of a U-tree, / = 4' (as each of the / cloaks 
in a node n is split into 4 sub-quadrants in the children of n). 

Our first optimization modifies the U-tree Tu (in a strategic way) 
to guarantee a bounded degree f — 4, without eliminating any 
nodes from Tu. This reduces the complexity of finding the opti- 
mum configuration of the new tree structure using Traj-anon, with- 
out affecting the approximation factor. To this end, we use another 
type of G-tree, called a US-tree, that splits "slower" than the U-tree. 
The idea is to spread the original 4' -way split that a U-tree node un- 
dergoes in a single level into I US-tree levels of 4-way splits. The 
first-level node (i.e., the original node) splits only the quadrant at 
the first snapshot, whereas the four second-level nodes split only 
the quadrants at the second snapshot, and so on. After I levels in 
the US-tree, the resulting 4' nodes become uniform quad-cloak se- 
quences again and are exactly the 4' direct children of the original 
node in the U-tree. Note that the US-tree has a constant degree 
/ = 4 and roughly half more nodes than the original U-tree. 

The Traj-anon algorithm can be applied to the US-tree Tus, yield- 
ing an improved time complexity 0(|Tus|if7|^) (just substitute 4 
for / in the complexity analysis of Section [4.1.1^ . Furthermore, 
since all nodes in the original U-tree Tu are included in the cor- 
responding US-tree Tus, Traj-anon is guaranteed to find a policy 
in Tu3 that is no worse than what it can find in Tu. In fact, the 
optimal policy from a US-tree can have a potentially better cost 
because there are more nodes (i.e., quad-cloak sequences) to be 
chosen from. 

While the above improvement leads to a polynomial time algo- 
rithm with constant exponent 5, this is still impractically high given 
the typical number of LBS users in a metro city (in the range of 1 
million for a city like San Francisco). 

We apply a second optimization idea: we adapt our algorithm 
from quad-tree to binary-tree partitioning of the space, i.e. each 
quadrant can be split into 2 semi-quadrants, rather than 4 sub- 
quadrants. We construct Tusb, the binary tree built from Tus above 
by extending it with nodes obtained by splitting quadrants into 2 
semi-quadrants. This immediately lowers the node degree bound 
from / = 4 to / = 2, yielding complexity 0{\Tusb\\U\^). 

In the next section we describe a succession of additional opti- 
mizations yielding the Smart Traj-anon algorithm that has reduced 
complexity of 0{\Tusb\{kh)'^), where h is the height of Tusb and 
k is the desired level of anonymity. As an additional optimization, 
we do not materialize the complete binary semi-quad tree: instead, 
we split a (semi-)quadrant only if one of its 2 children contains 
at least k trajectories. Notice that due to this construction \Tusb\ 
depends on the spatial distribution of trajectories, but is bounded 
in the worst case by the number of trajectories \ U\. Therefore for 
fixed k and h. Smart Traj-anon scales linearly with \ U\. 



4.2 Optimizations Details (Extended Version) 

4.2.1 US-tree 

Given a T-uniform tree Tu of uniform cloak sequences of length 
I, we introduce intermediate nodes (cloak sequences of length 1) 
between a non-leaf node m £ Tu and it's child nodes such that the 
resulting structure has the following properties: 

• it is a G-tree . 

• has all the nodes of Tu (and some additional nodes). 

• y is an ancestor of x in this G-tree, if y is parent of x in T„. 

• each non-leaf node in this G-tree has exactly 4 child nodes. 

We refer to this G-tree as US-tree. The nodes that are inserted 
between a node m £ T and its children are not uniform cloak 
sequences and are obtained by ordered 1-step generalization, that 
we describe next. 

Ordered 1-step generalization. As described earlier, there are I 
1-step generalizations of a cloak sequence of length I, one corre- 
sponding to each cloak in the cloak sequence. Ordered 1-step gen- 
eralization refers to the process of obtaining I sequence of cloaks 
by I "successive" 1-step generalizations, such that the ith 1-step 
generalization is obtained by replacing the ith cloaks in the cloak 
sequences obtained by (i — l)th 1-step generalization. 

Given a T-uniform Tu of length I, we obtain the US-tree by in- 
serting intermediate nodes, between node m and its 4* child nodes, 
that are obtained by ordered 1-step generalizations of the child 
nodes. For each child node, we obtain I cloak sequences using or- 
dered 1-step generalization and during the computation of ordered 
1-step generalization, the cloak sequence obtained by ith 1-step 
generalization is made a parent of the {i — l)th 1-step generaliza- 
tion. Each each child node, we obtain the node m in the l*^ 1-step 
generalization. 

We use only one node to represent a cloak sequence even if is 
obtained via total 1-step generalizations or two or more nodes. For 
all the child nodes, consider the 1st 1-step generalization in the or- 
dered 1-step generalization. Since each quadrant in Q has 4 child 
nodes, there are 4 child nodes that have identical 1st 1-step gener- 
alization. Thus each of these intermediate nodes has 4 child nodes. 
Similarly each cloak sequence obtained in the ith 1-step gener- 
alization is common for 4 intermediate nodes that were obtained 
{i — l)th 1-step generalization. In the l^^ 1-step generalization we 
obtained the node m from the 4 intermediate nodes obtained by 
{I — 1)"* 1-step generalization. 

Thus even after inserting the intermediate nodes, the resulting 
structure is a US-tree, containing all the nodes of Tu, where each 
non-leaf node has exactly 4 child nodes. Moreover since the par- 
ent cloak sequences a 1-step generalization of its child, the parent 
nodes in G-tree completely masks their child nodes. 

Next we adapt the Traj-anon algorithm to find the optimum con- 
figuration C for for a given US-tree Tus and a set U of users. G rep- 
resents the equivalence class of policies that has the optimum cost 
among policies that use cloak sequences from Tus to anonymize 
the set U of user-history objects. Moreover the cost of these poli- 
cies is never worse then the cost of optimum policy that uses only 
uniform cloak sequences i.e. nodes in Tu. 

Since Tus is a quad- tree the complexity of the above algorithm 
dominated by the steps 16|l8 is 0{\Tus\\U\^) where |r„s| repre- 
sents the number of nodes in Tusand \ U\ represents the number of 
user-history objects. Note that ordered 1-step generalization inserts 
0(4') nodes between a non-leaf node and its children in Tu, there- 
fore the resulting US-tree Tus has 0(m * 4') more nodes then Tu 
but this number is independent of the number of trajectories hence 
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for 1 < m < \Tus\ do 

for 1 < u < \U\ do 

M[m][u] := (00, 0, 0, 0, 0) {initialize) 

end for 
end for 

for all node m G T^s do 

if (m is a leaf node) and (d(m) < k) then 

M[m][d(m)] := (0,0,0,0,0) 
else if (m is a leaf node) and (d(m) > k) then 
M[m][d(m)] := (0,0,0,0,0) 
for < u < d{m) — fe do 

M[m][u] := {area{m) x (d(m) - tt), 0,0, 0,0) 
end for 
else (m is a non-leaf node} 

let mi , 7712 , ■ • ■ , are children of m 
for all u in F(m) do 

pick Ml G F{mi),U2 € F{m2), ■ ■ ■ , M4 G F{m4,) 
that minimize the quantity 

X ~ Y.t=i M^[mi][ui] + {area{m) x ((ELi "^i) 

u)) 

where 

F{m) denotes the set [0..(d(m) - A:)] U {d(m)}, 
and ^ [i] [j] returns the first component of the 
tuple at M[i][j] 

M[m][u] := {x,ui,U2, - ■ ■ , U4) 
end for 
end if 
end for 
return M 



a constant factor. The reduced complexity results in reduced run- 
ning time for finding the optimum configuration as observed in our 
experiments. 

Even though Traj-anon for a US-tree of length I has reduced 
complexity in comparison to Traj-anon for a T-uniform of length 
I, we do trade cost to achieve better complexity. This is shown by 
the following result. 

(Extended Version) Lemma 5. Given a US-tree Tub of length 
I, T-uniform Tu of length I and a set U of user-history objects of 
length I and the level of anonymity k, the cost of the optimum k- 
summing configuration for Tus is never more than the cost of the 
optimum k-summation configuration for Tu. 

As a result the upper bound on the approximation ration i.e / still 
holds for a US-tree. This follows directly from Lemma[5] 

(Extended Version) Lemma 6. Algorithm Modified Traj-anon 
computes the l-approximation solution to the problem of optimum 
offline TP-aware sender k-anonymity. 

Moreover the average case cost of optimum policy that uses Tus 
is lower than the policy that uses T„ since when using Tus a policy 
has more options for choosing a sequence of cloaks to anonymize 
a user object. 

While the above optimization leads to polynomial time algorithm 
with constant exponent, the degree 5 is still high given the typical 
number of LBS users in a metro city (in the range of 1 million 
for city like San Francisco). Next we describe a pair of optimiza- 
tions to further reduce the complexity of Traj-anon and run-time 
optimizations to achieve practical running time, while guarantee- 
ing to preserve the approximation bound. These optimizations are 



similar in spirit to the optimizations for quad-tree based snapshot 
policy-aware sender k-anonymity described in | |16J with the differ- 
ence that the tree in our study consists of a quad-tree of quad-cloak 
sequences. 

4.2.2 From Quad to Binary Tree 

In the US-tree, if anonymizing a trajectory to a node does not 
provide the desired k-anonymity, the next possible option is the 
parent node. Since the parent node is 1-step generalization of the 
child, the cost of the new cloak in the parent cloak sequence is 4 
times that of the replaced cloak in the child cloak sequence. As 
shown in |24| and 1 16|, the granularity of this cost increase can be 
reduced by converting a quad-tree into a binary tree by using semi- 
quadrants as cloaks (where a semi-quadrant is obtained by splitting 
a quadrant into two rectangles, either vertically or horizontally). 
Use semi-quadrants in the uniform cloak sequences to anonymize 
the trajectories leads to following: 

• The Traj-anon algorithm finds the optimum policy among all 
the policies that use uniform semi-quadrant cloak sequences. 

• The policy obtained using Traj-anon is /-approximation of 
the optimum policy that use semi-quadrant cloak sequences. 

• Using total 1-step generalization we can obtain binary US- 
tree that contains all the uniform semi-quadrant cloak se- 
quences, and in which each node has exactly 2 child nodes. 

As a result of this optimization, the complexity of the optimized 
Traj-anon for a binary US-tree T^sb and a set of trajectories U, is 
0(\Tusb\\U\^). In addition, since the semi-quadrants are smaller 
than the quadrants, this optimization also reduces the average cost 
of anonymization. 

4.2.3 Pruning Suboptimal Configurations 

For any node m of the USeq-Btree, in the for loop of step 16, 
Traj-anon inspects [d{jn) — k + 1) configurations (all possible k- 
summing configurations). We realize that some of these configu- 
rations need not be considered, as they are guaranteed to be sub- 
optimal. In fact we claim the following lemma: 

(Extended Version) Lemma 7. For a node m with height h(m) 
(where the height of the root is 0), any configuration in which m 
passes up to its ancestors the cloaking responsibility of more than 
k{h{m) + 1) but less than d{m) trajectories, is not optimal. 

By Lemma[7] it suffices to compute k{h{m) + 1) configurations, 
by simply replacing function F in step 16 of algorithm Modified 
Traj-anon with function F'{m) = [0..{k{h{in) + 1))] U {d(m)}. 
Thus for a non-leaf node m, the algorithm computes 0{kh) config- 
urations and to compute each such configuration, the "pick" action 
iterates over 0{kh) configurations of m's two children. This leads 
to a new upper bound of the overall running time, 0{\Tusb\{klif). 

4.2.4 Precomputation 

Similar to Bulk dp, there is significant overlap in the compu- 
tations across iterations of For loop in Step 16 of Modified Traj- 
anon. For example, if one iteration works on the M entry for 
(m,u), inspecting for instance (mi,?ii) and (7712,112) such that 
wi + 112 = u, then the next iteration (m, u + 1) will inspect the 
cases (mi, 111 + 1), (7712, U2) and (mi, mi), (m2, 112 + 1), among 
others. The idea is to reuse this computation across iterations. 

To this end, we stage the computation in 2 parts. In the first 
stage we iterate over the 0{kh) configurations of both children to 
compute a temporary matrix temp. There are 0{kh) entries in this 
matrix and the complexity of this stage is bounded by 0((kh)^). In 



the second stage, we create 0(kh) configurations using the 0{kh) 
entries of temp. Thus the running time for the second stage is also 
bounded by 0{{kh)^). Therefore the overall complexity of the 
modified step 16 is 0{{kh)^) and the overall complexity of the 
modified algorithm becomes 0{\Tusb\{khY). 

4.3 Runtime Pruning 

We implement a runtime optimization to further reduce the run- 
ning time of the modified Traj-anon. We create the binary US- 
tree top-down by successively splitting the semi-quadrants, start- 
ing from the root node. But we do not eagerly materialize all nodes 
of the binary US-tree, instead, we split a (semi-)quadrant only if it 
contains sufficient users to maintain anonymity. 

5. EXPERIMENTS 

In this section we describe a set of experiments to evaluate the ef- 
fectiveness of our Smart (optimized) Traj-anon algorithm. We eval- 
uate scalability and performance and compare the cost of anonymiza- 
tion and execution time of Smart Traj-anon with a set of alter- 
nate anonymization techniques. Since we focus exclusively on the 
Smart version of Traj-anon, we will drop the qualifier in the re- 
mainder of the section. 

Our experiments show that Traj-anon scales linearly with the 
number of trajectories and can anonymize up to 2 million trajecto- 
ries of length 30 within 4 min. We show that the other anonymiza- 
tion techniques either have higher anonymization cost (up to 100 
times higher) or running time (up to 2000 times slower). 

Trajectory Data. Due to legal hurdles we could not obtain 
actual user trajectory data from LBS providers, but we were able 
to resort to the Brinkhoff generator [ 14| to generate the trajectory 
data for our experiments. The Brinkhoff generator has been widely 
used to generate moving object data for studies in various fields, 
including location-based services and beyond. It takes as input the 
road network of a metro area and generates trajectories of various 
classes of moving objects that are constrained by the road network. 
The classes differ in number and speed with which the trajectories 
move relative to each other (e.g. cars, bikes, pedestrians, etc.). We 
generated a master data set of 2 million trajectories of length 30, 
with 5 different classes of moving objects, using the actual road 
network of the San Francisco Bay area. Then we drew random 
samples of increasing number of trajectories (10k, 50k, 100k etc.) 
of length 10 and 30. 

Platform. Unless otherwise stated all our experiments run on 
a Linux server with an Intel Xeon Processor (2.8GHz) and 32G 
memory. In one experiment, we had to use a machine with Intel 
Pentium Core2 Duo processor (2.4Ghz) with 2 GB RAM and run- 
ning Cygwin on Windows XP because the binary we got from the 
authors of 1 30] was compiled for that configuration. 

Anonymity degree. In all experiments, k = 50. 

5.1 Scalability 

In the first set of experiments we evaluate the scalability of the 
Traj-anon algorithm by increasing the number of trajectories to be 
anonymized, from 10k to 2 million, for a fixed k=50 (we consider 
both trajectory length 10 and 30). As shown in Figure [9] the al- 
gorithm scales linearly with the number of trajectories. In partic- 
ular, Traj-anon anonymizes 2 million trajectories of length 30 in 
less than 4 min. Figure [TO] breaks down the running time into a) 
loading the user trajectories from a file to the main memory data 
structures, b) obtaining the optimum configuration for the user tra- 
jectories and, c) obtaining the policy from the configuration (as ex- 
pected this time is negligible, under 1%). 



5.2 Related anonymization techniques 

We are unaware of any competing TP-aware sender anonymity 
solutions. As detailed in Section [6] the previously proposed al- 
gorithms for trajectory-aware sender k-anonymity do not defend 
against policy-aware attackers, and as shown in Section [T] policy- 
aware snapshot sender k-anonymizing algorithms 1 16] do not de- 
fend against trajectory-aware attacks. Since we couldn't find direct 
competitors, we created some by leveraging existing work. 

As a baseline approach we decided to extend an algorithm for 
policy-aware snapshot sender k-anonymity to TP-aware sender k- 
anonymity. We chose the Bulk^p algorithm in |16| since it pro- 
vides the optimum anonymization for a snapshot and uses (semi-)- 
quadrant cloaks (just like Traj-anon). 

We also considered solutions proposed for trajectory anonymity, 
a privacy problem orthogonal to sender anonymity. In trajectory 
anonymization, the goal is to anonymize user trajectories such that 
an attacker, who knows locations of users in certain snapshots (par- 
tial trajectories), cannot infer whether a user's trajectory passes 
through a particular location. Trajectory anonymity tries to hide 
the user's whereabouts, while sender anonymity assumes them as 
known and focuses instead on hiding the identity of request senders. 
Due to the different goals and assumptions of the two privacy guar- 
antees, some of the data transformation techniques employed in 
trajectory anonymization (such as deletion of locations, addition of 
locations, and shifting locations from a trajectory), do not apply to 
sender anonymity. Despite the differences, we identified a class of 
trajectory anonymization solutions whose techniques can in prin- 
ciple be adapted to provider TP-aware sender k-anonymity. This 
class of solutions use some clustering algorithm to partition user 
trajectories into groups of k trajectories and then applies other data 
transformations (described above) to preserve trajectory anonymity. 
We realized that one can adapt the clustering techniques to obtain 
the bundles used in offline Tf -aware sender k-anonymization. We 
borrowed the clustering techniques from state-of-the-art trajectory 
anonymization solutions |25, 30 1 to obtain three different compet- 
ing solutions for offline TP-aware sender k-anonymity. 

Next we describe these three solutions along with the baseline 
approach based on snapshot P-aware sender k-anonymity. 

Baseline TP-aware is based on the P-aware snapshot anonymiza- 
tion algorithm Bulkdp described in |16|. We format the input 
trajectory data as a sequence of snapshots. We anonymize the 
first snapshot of the input trajectory data using Bulkdp and group 
together the trajectories whose locations in the first snapshot are 
anonymized to the same cloak. Since Bulkdp provides policy- 
aware sender k-anonymity each group must have at least k mem- 
bers. For each group and for each snapshot, we find the smallest 
quadrant that masks the locations of the trajectories in the group. 
Thus for each group we obtain a sequence of quadrants that plays 
the role of a bundle in the sense of Traj-anon. This anonymization 
provides TP-aware sender k-anonymity since there are at least k 
trajectories that are anonymized to the same sequence of cloaks. 

Fast Clustering is based on the /ait TGA algorithm in |25|. It 
creates a cluster of k trajectories by first randomly selecting an 
unanonymized trajectory as the center of the cluster and then adding 
its k-1 nearest neighbor trajectories to the cluster. The distance be- 
tween two trajectories is the sum of the "distances" between their 
locations in each snapshot and the distance between two locations 
is the logarithm of the area of the smallest axis-parallel minimum 
bounding rectangle (rectangle whose sides are parallel to the x and 
y axis of a 2-dimensional plane) that masks the two locations. 

Slow Clustering is based on the multi TGA algorithm in (25 1 . 
To create a cluster of k trajectories, it first randomly selects an 
unanonymized trajectory as the center of the cluster and adds k-1 



additional trajectories one by one so as to minimize the cost of the 
cluster. The cost of a cluster is the sum of the logarithms of areas 
of axis-parallel MBRs, that masks the locations of the trajectories 
in the cluster, in each snapshot. 

Hilbert-based Clustering 1 30] uses an embedding of two-dimen- 
sional into one-dimensional space, associating to each location a 
Hilbert index which is then used to simplify the nearest-neighbor 
computation used to cluster trajectories. The original approach re- 
quires identification of certain locations in a trajectory as quasi- 
identifiers (uniquely identifying the user). Since we assume that 
the entire user trajectory is accessible to the attacker, every location 
in his trajectory is a potential quasi-identifier. Thus in the input to 
the Hilbert-based clustering algorithm we specify every location of 
a trajectory as quasi-identifiers. As a result, the distance between 
two trajectories is the sum of the absolute difference between the 
Hilbert indexes of the locations in each snapshot. Since the algo- 
rithm computes the clusters of k- 1 nearest neighbors for each tra- 
jectory independently, two clusters can have some trajectories in 
common. Clusters with common trajectories are merged. 

In the three clustering-based approaches, after computing the 
clusters, all trajectories in a cluster are anonymized using the se- 
quence of axis-parallel Minimum Bounding Rectangles (MBR) (rect- 
angles whose sides are parallel to the x and y axis) that masks each 
snapshot of the clustered trajectories. We compare the execution 
time and cost of anonymization obtained using the four algorithms 
described above with those of Traj-anon. To make a fair compari- 
son, we modify the output of Traj-anon and replace the quadrants 
in the cloak sequence with axis-parallel MBRs ensuring that the 
MBR that replaces a quadrant must be included in the quadrant. 

We implemented the Baseline, Fast and Slow Clustering algo- 
rithms in C++ and obtained the Hilbert-based Clustering binaries 
from the authors of (30) (compiled for Cygwin on Windows XP). 

Figure [TT] shows the cost of anonymizing an increasing num- 
ber of trajectories (10k, 50k, lOOk, 200k,600k and IM) of length 
30, using Traj-anon and the clustering-based algorithms described 
above (results for length 10 are similar and not shown here; we 
stopped at IM trajectories as the other algorithms did not scale). 

Comparison with snapshot-based Baseline. As shown in Fig- 
ure [TT] the Baseline approach has the highest cost among all the 
anonymization algorithms. It is significantly more than Traj-anon. 
For 600k trajectories and more the cost of anonymization with Base- 
line is 100 times that of Traj-anon. This is because having optimum 
cost for one snapshot leads to bigger cost for other snapshots when 
the trajectories diverge (since the trajectories in a group must be 
anonymized together in all the snapshots). 

Comparison with Fast Clustering. As shown in Figureflljthe 
cost of anonymizing trajectories using fast clustering is more than 
that with Traj-anon and slow clustering. The difference between 
the anonymization cost increases with the number of trajectories 
and for 1 million trajectories of length 30, the anonymization cost 
of fast clustering is 100 times more than that of Traj-anon. In terms 
of execution time, as shown in Figure [12] the fast clustering takes 
significantly longer in comparison with Traj-anon. For e.g. Traj- 
anon takes less than 1.5 min to anonymize 1 million trajectories of 
length 30 in comparison to 370 min by fast clustering. 

Comparison with Slow Clustering. As shown in Figure [TT] 
the cost of anonymizing trajectories using slow clustering is better 
than that with Traj-anon, by a factor of roughly 4: for 600K (IM) 
trajectories, Traj-anon obtains a cost of 20 x 10^^(28.5 x 10^''), 
Slow clustering a cost of 5 x 10^''(6.6 x lO^*"). But as shown 
in Figure [12] Slow Clustering is the slowest of all anonymization 
techniques. It takes over 3 days for the slow clustering algorithm to 
anonymize 1 million trajectories of length 30, in comparison under 
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1.5 min with Traj-anon. The poor performance of Slow Clustering 
is not just accidental (for this data set) but intrinsic to the algorithm 
due to 0{n^) distance computations between n trajectories. 

Comparison with Hilbert-based Clustering. Figure [13] com- 
pares the anonymization cost of Hilbert-based clustering with that 
of Traj-anon. We could not process more than 50k trajectories be- 
cause the Hilbert-based clustering implementation does not scale 
beyond. The cost of Hilbert-based clustering is considerably higher 
than Traj-anon. Even for lOk trajectories the cost of Hilbert-based 
clustering is 30 times higher than that of Traj-anon and, reaching 
50 times for 40k trajectories. A possible reason for the higher cost 
of Hilbert-based clustering is the distortion introduced in mapping 
the 2-dimensional space to a single dimension. 

As shown in Figure[T4]the Hilbert-based clustering is consider- 
ably slower than Traj-anon. It takes over 11 hours to anonymize 
50k trajectories of length 30 using Hilbert-based clustering in com- 
parison to 12 sec by Traj-anon. Even though the Hilbert-based 
clustering uses a simpler distance function, it is slower due to the 
high number of comparisons to find the k nearest neighbors. 

6. RELATED WORK 

In the context of LBS, the two aspects of privacy that have re- 
ceived most attention are trajectory anonymity and sender anonymity. 

Trajectory anonymity. As detailed in Section [5^ the line of 
work on trajectory anonymity (25 1 130| |28| |22| is complementary 
to ours: its goal is to hide the user's precise location over a period 
of time (one is not required to hide the identity of the user), while 
sender anonymity hides the identity of the user, assuming that the 
trajectory data falls in the attacker's hand. Even though the problem 
of trajectory anonymity is orthogonal to the problem studied in this 
paper, as described in Section|5]a class of clustering based solutions 



can be adapted to provide offline TP-aware sender k-anonymity, 
and we have compared against them in detail. 

Classes of Attackers. The solutions for sender k-anonymity 
in the context of location-based services can be classified into four 
categories based on the class of attackers they prevent against: 

Policy-unaware trajectory-unaware: The solutions | |T9| |24| [20) 
in this class are also known as k-inside policies |16| as these solu- 
tions use tightest cloak (of a pre-defined shape) that includes the 
sender and k-I other users. This class of solutions neither pre- 
serve privacy against a policy-aware attacker (as shown in 1 16 |) nor 
against a trajectory-aware attacker (also shown in ^2j^29,,I5j ). 

Policy-aware trajectory-unaware: This class of solutions |I I[ 
|16| ensures that there are at least k users anonymized using the 
same cloak. The privacy guarantee of these solutions is strictly 
stronger than the policy-unaware solutions i.e. they also defend 
against policy-unaware trajectory-unaware attackers but not con- 
versely (for a formal proof see |16|). But as shown in Section [T] 
they fail to preserve privacy against a policy-aware attacker who is 
also trajectory-aware. 

Policy-unaware trajectory-aware: This class of solutions |29[ 
|18||I5[[T2) targets anonymity against the trajectory-aware attackers 
using a sequence of cloaks that masks the user and the same k- 
1 users for the entire duration of the user trajectory. 1 15| claims 
policy-awareness as well, but the claim needs qualification, as it 
isn't clear what the attacker knows: a) the mapping from a given 
set of user trajectories to the sequence of cloaks, or b) the algorithm 
producing this mapping in addition to the mapping itself (this is 
our sense of policy-awareness). We claim that [ 15| defends against 
attacker class a) but not b), and thus gives a weaker guarantee than 
the one we target here. We illustrate how the 2-sharing property 
of 1 15 1 allows policy-aware attackers to breach privacy. 



Sender=B Sender=C 
«A f'B; «C ! 



Figure 15: 2-sharing policy 

Example 11. Consider the cloaking algorithm in (^15^ that 
takes into account the requesting location to generate cloaking groups 
(set of locations that are cloaked to the same region ). For the loca- 
tions in Figure \15\ if the first request is made by C the algorithm 
groups C with B whereas if the first request is made by B then B 
and A end up in the same cloaking group to satisfy the 2-sharing 
property. In the case when the first anonymized request contains 
the cloak corresponding to {C, B}, a policy-aware attacker imme- 
diately infers that the sender is C. □ 

Policy-aware trajectory-aware: We are unaware of any work that 
provides policy-aware and trajectory-aware sender k-anonymity and 
therefore we propose the guarantee in this paper. As illustrated 
in Section [T] even this privacy guarantee does not allow to com- 
pletely publish the linkage between multiple requests sent by the 
same user. It does allow to publish the requests made along a tra- 
jectory bundle. 

Trusted LBS. In the model used in this paper we assume that 
the LBS provider is a trusted entity and responsible for anonymiz- 
ing the user requests that it collects over a period of time. We 
share this assumption with a line of work on trajectory anonymiza- 
tion |25||30]|28]|T0]|23]|22) where the location provider (who logs 
user trajectories) is trusted and is responsible for anonymization. 

Online vs Offline. Another contrasting feature between pre- 
vious trajectory-aware sender anonymity proposals and the one in 
this paper is the mode of anonymization. In (29||I8[p3]|T2 | LBS 
requests are anonymized as they are issued i.e. online while we 
anonymize the request log i.e. offline. One can possibly use the 
online solutions for the offline TP-aware sender anonymization but 
with a necessarily sub-optimal cost since the future movement of 
the users is not known by the online anonymizer. The cloak that 
masks a group of k users can become arbitrarily large if their tra- 
jectories diverge. 

Beyond sender k-anonymity: 1-diversity. In the setting of re- 
lational table anonymization, k-anonymity is viewed as a classical 
baseline, recently subsumed by stronger guarantees ranging from 
1-diversity (21| to differential privacy. For the LBS context, this 
raises the natural question of analogous guarantees that subsume 
sender k-anonymity. We note that LBS sender privacy is a much 
younger field, in which even such a fundamental guarantee as sender 
k-anonymity (especially in its TP-aware form) hadn't been solved 
until now. We also note that in an LBS context, sender k-anomymity 
is not weaker than sender 1-diversity, actually coinciding with it. 
To see why, consider an analogy to the "homogeneity attack" that 
breaks classical k-anoymity but is foiled by 1-diversity [21 1: there 
is a possibility that all user histories masked by a bundle send iden- 
tical requests in a particular snapshot (possibly from different lo- 
cations). But since a bundle associates a set of requests (no dupli- 
cates) with a cloak, all the identical requests are represented by a 
single request, thus precluding the homogeneity attack, and in fact 
any attack based on the distribution of request values. 

7. CONCLUSIONS 

We introduce and study the problem of offline trajectory- and policy- 
aware sender k-anonymity. We show that prior results for snapshot 
k-anonymity do not apply and that trajectory-awareness leads to 
strictly stronger attackers, calling for a stronger privacy guarantee. 



We show that optimum TP-aware anonymization is computation- 
ally harder than snapshot P-aware anonymization (NP-complete vs. 
PTIME). We propose a PTIME /-approximation algorithm for tra- 
jectories of length I and empirically show its effectiveness. 
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APPENDIX 
A. PROOFS 
A.l Lemma [1] 

Proof, (a) Let C/ be a set of n trajectories and policies Pi and 



P2 are equivalent for anonymizing U w.r.t. a G-tree T. We describe 
the cost of anonymizing U using Pi as: 

Coat{Pi,U) = Cost{mx) + Cost{jn2) + . . . + Cost{m„) 

where nii = P\{U,Ui) £ T for 1 < i < n. Note, for i 7^ j, nii 
and rrij can be the same node in T. Similarly we describe the cost 
of anonymizing U using P2 as: 

Cost{P2, U) = Cost{m'^) + Cost{rn2) + . . . + Cost{m'„) 

where = P2 (C/, £ T for 1 < i < n. Note, for i 7^ j, m'i and 
m'j can be the same node in T. Since Pi and P2 are equivalent, if 
a node m G T is used by Pi to anonymize x trajectories in U then 
m is also used by P2 to anonymize the same number of trajectories 
in U . Therefore, 

Cost{mi) + Coat(m2) + . . . + Costirrin) 

— Cost{m'i) + Cost{m'2) + . . . + Cost{m'„) (1) 

because each quadrant appears same number of times on both sides 
of Equation[T] Hence we have: 

Cost{P^,U) = Cost{P2,U) 

(b) Suppose Pi provide TP-aware sender k-anonymity to U w.r.t. 
T. Therefore, for each node m £ T, Pi either anonymizes none 
or at least k trajectories using m. We are given that Pi and P2 are 
equivalent for T, therefore they both anonymize the same number 
of trajectories using m. Therefore, P2 either anonymizes none or 
at least k trajectories using m. Thus, P2 also provides TP-aware 
sender k-anonymity to U w.r.t. T. Similarly we can show that if P2 
provides TP-aware sender k-aonymity to U, so does Pi. 

□ 

A.2 Lemma in 

Proof. Let [/ be a set of trajectories and T be a G-tree of quad- 
cloak sequences. Let P be a policy that uses the cloak sequences 
from T and C be the configuration representing the class of policies 
equivalent to P. 

First we assume that P provides TT-aware sender k-anonymity 
and show that G is k-summing configuration. Since P is TP-aware 
sender k-anonymous, each quad-cloak sequence in T is used in P 
to anonymize either none or at least k trajectories. Thus 

• For a leaf node m £ T 

(i) If d{m) < k, then P cannot anonymize any trajectory 
using m, therefore C{m) = d{m). 

(ii) if dim) > k, then P could either anonymize at least k 
trajectories or none. In former case G(m) < {d{m)~k) 
while in later case G(m) — d{m). 

• For an internal node m G T, let A = Yl\=i C{mi), 
where mi . . .mi are the children of m in T 

(iii) if A < fc then there are less than k trajectories passed up 
by children of m. Thus P cannot anonymize any trajec- 
tory using m and therefore we have C{m) = A. 

(iv) if A > fc then the children of m passes up at least fc 
trajectories. Therefore, P could either anonymize at least 
fc trajectories or no trajectory using m. In former case 
G(m) < (A — fc) while in later case C{m) — A. 

Thus G satisfies k-summing property. 

Next we assume that G is k-summing configuration and show 
that P provides TT-aware sender k-anonymity to U. Equivalently 



we show that under C, each cloak of T is used to anonymize either 
none or at least k trajectories. Since C is a k-summing configura- 
tion, it implies that: 

• for a leaf node m £ T 

(i) if d(m) < k, then Cijn) — d{m). Thus P does not 
anonymize any trajectory using m. 

(ii) if d(m) > k, then either C(m) = d{m) or 

C(m) < {d{m) — k). In the later case P anonymizes 
at least k trajectory using m, while in former case none 
trajectory at all. 

• for an internal node m £ T let A = X]!;=i C{mi), 
where m\ . . .mi are the children of m in T 

(iii) if A < fc, then C(m) = A. Thus P does not anonymize 
any trajectory using m. 

(iv) if A > fc, then either C(m) = A or C(m) < (A - 
k). In the later case P anonymizes at least k trajectories 
using m, while in former case none trajectories. 

Therefore P provides TP-aware sender k-anonymity. 

□ 

A.3 Lemma |3] 

Proof. The intuition behind Lemma[3]is that Traj-anon algo- 
rithm exhausts the search space of all potential optimal k-summing 
configurations by utilizing Property 2. To prove it formally, we use 
structural induction to show that for each node m in the G-tree T 
and an integer / such that I < d{m), we have 

costaig{m, I) = costmin{cset(m, I)) 

where costaig{m,l) represents the cost computed by Traj- 
anon for passing up I (unanonymized) trajectories at m and 
costmin{cset{m,l)) represents minimum cost of passing up I 
trajectories at m among all such k-summing configurations. 

Basis: For a leaf node m and an integer I < dim), it is obvious by 
construction that cosfa;g(m, /) = coatmin(cset(m,V)). 

Induction: Let m be a non-leaf node in the G-tree T and 
mi , m2 mf be the children of m. Let I be an integer such 
that I < d{m) and csetijn, I) be the set of k-summing configura- 
tion that passes up / (unanonymized) trajectories at node m. We 
show that for each configuration g £ cset{m, I), costaig{m, I) < 
cost{g{m)), where cost[g(m)) represents the cost of g at node m. 
If g(m\) = h, g{m2) — I2, ■ ■ ■ and girnj) — If, the cost of m in 
g can be written as 

cost{g{m)) :— [cost{g{mi)) + cost{g{m2)) + ■ • ■ + cost{g{mf)) 
+ cost{m) X {h + h-l \-lf - I)] 

By induction hypothesis we assume that costaig(m\,l\) < 
cost{g{mi)), and similarly costaig{m2.,l2) < cost{g{m2)), • • • , 
costaiginii, h) < cost{g{mi)), • ■ • , 
costaig(mf ,lf) < cost{g{mf)). Therefore 

COStalg{mi,ll) + costal g{m2,l2) H \- COStalg{mf,lf) 

< cost{g{mi)) + cost{g{m2)) + ■ • • + cost{g{mf)) 

And by adding the constant value cost{m)x{li+l2 + - ■ ■+lf — l) 
to both the sides we get 



costaig{jn,l) < cost{g{m)) 

Similarly, for each node m and each integer I < dim), 
and each configuration g € cset{m,l), we can show that 
costaig{m,l) < cost{g{m,)). Therefore costaig{m,l) = 
costmin{cset{m,l)). □ 

A.4 Lemma m 

Proof. We describe the cost of anonymizing the set U of n 
trajectories using policy P as: 

Cost{P,U) = Cost{P{U,u)) 

u€U (2) 

= Cost{mi) + Cost{ra2) + . . . + Cost{mn) 

where rrii = P{U, Ui) £ T for 1 < i < n. Since, for i 7^ j, rrii and 
rrij can be the same node in T, we can rewrite the above equation 
as follows: 

Cost{P,U) = f'i'^'P) X Cost{m) 

where f'{m,P) is the number of trajectories anonymized by P 
using cloak sequence m. Since C represents the equivalence class 
of P, 

Vmer, f'(m,P) = f{m,C) 

where /(m, G) is as defined in Definition [4] Therefore the cost of 
configuration C of T can be written as: 

Costc{C,U) = Y /("^'C*) >^ Cost{m) 

= Y f'{m,P) X Cost{m) 0) 
= Cost(P, U) 

□ 

A.5 Lemma H] 

Proof. First we give the intuion of this proof. Let G„ be the 
optimum k-summing configuration for Tu- Notice that every node 
in Tu is also a node in the corresponding Tusq- Therefore Cu is 
a valid but not necessarily optimal k-summing configuration for 
Tusq- Consequently, the cost of the optimum k-summing configu- 
ration for Tusq is not more than the cost of Cu- 

To prove it formally, we can define a configuration G' for Tusq 
as follows 

• G'(m) = C{m) for m £ Tusq and m £ Tu 

• C'{m) — X]f=i C (rrii) for m, mi . . . m4 £ Tusq and m ^ 
Tu, where mi . . . m4 are child nodes of m 

The above conditions ensure that G' only uses those nodes of Tusq 
for anonymization that are also in Tu - Each node m that is not in 
Tu pass-up all the trajectories that are passed up by m's child nodes 
to be anonymized by m's ancestors. G' is a valid configuration 
since Vm, G'(m) < d(m) and G'(m) < Ef=i C is 

k-summing since G is k-summing and Cost{C') = Cost{C) since 
the newly inserted nodes are not used for anonymization and the 
nodes that are used for anonymization have the same cost in the 
two G-trees. □ 



A.6 Lemma |7] 

Proof. Let U be a set of user trajectories, B a USeq-Btree, C 
an optimal configuration of B, and P a policy it represents. Sup- 
pose there is a node (cloak sequence of semi-quadrants) m £ B 
such that k{h{m) + 1) < C{m) < d{m). Then by pigeonhold 
principle, there is a set 5* of at least k trajectories such that (i) all 
the trajectories in S are masked by m, and (ii) each trajectory in S 
is anonymized by P using some of the h{m) ancestors of m, and 
(iii) if all the trajectories in S are removed, the cloak sequence they 
were mapped to under P continue to anonymize at least k trajecto- 
ries. We then construct a policy P' that anonymize the trajectories 
in 5* using m instead of its ancestors. P' continues to be TP-aware 
sender k-anonymous, but has lower cost, contradicting the optimal- 
ity of P. □ 

A.7 Theorem [1] 

Proof. We prove this by reducing the problem of optimum k- 
anonymization of relational tables on binary alphabet, shown to 
be NP-hard in [26] |13| , to the optimum offline TP-aware sender 
k-anonymity with quad-cloaks. 

We first briefly describe the problem of optimum fc-anonymization 
of relational tables on binary alphabet with suppression. Let T be a 
relational table with m columns where each tuple contains data cor- 
responding to a unique user. The tuples of T can be considered to 
be m-dimensional vectors Vi drawn from E™, where E = {0, 1}. 
Thus T can also be represented as a subset T C E™. Let ★ be a 
fresh symbol not in E. Suppression is defined as follows: 

Definition 6 (Suppressor). Let f be a map from T to (E U 
{★D™. We say f is a a suppressor of T if for all t £ T and 
j — 1, ... ,m it is the case that f{t)[j] G *}. 

Inuitively, a suppressor function replaces the values of certain 
attributes in certain tuples with The idea behind a suppressor 
function is that by replacing values of certain attributes in a set of 
tuples with it can make all of them identical such that an attacker 
cannot associate a tuple in that set to the actual user This is for- 
malized in the following definition. 

Definition 7 (k- Anonymity). Let T be a relational table 
and f a suppressor function. The anonymized table f{T) is said to 
be k-anonymous if for any t £ T, there exist k distinct vectors in 
T such that f{ti) = f{t2) = ■■■ = f{tk) = fit). 

Since there can be many possible suppressor functions, the goal 
is to find one that minimizes the number of suppressed values i.e. 
number of * in the anonymized table. The problem of optimum 
k-anonymization of relational tables on binary alphabets is defined 
as follows. 

Definition 8 (Optimum k-anonymity). Given a relational 
table T C E™, and a positive integer c € N, is there a suppressor 
f such that f{T) is k-anonymous, and the total number of vector 
coordinates suppressed in f{T) is at most c? 

For ease of presentation we use k=3 and reduce the optimum 3- 
anonymity of relational table on binary alphabet to optimum offline 
TP-aware sender 3-anonymity. This particular relational problem 
we reduce from is proved NP-hard in j26||13| . Given a relational 
table T with n m-dimensional tuples over binary alphabet {0, 1}, 
we create a set of n user-history objects with trajectories of length 
m. For each tuple ti, we create a user-history object Ui and set the 
location at the j'-th snapshot of the trajectory as 
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Figure 16: Locations corresponding to the binary data in a Rela- 
tional Table 



• if the j-th column in ti has the value 0. 

• (nr + i, i) otherwise, where r = 2^^'^ V2mn] ^ 

We construct a quad-tree Q such that the root node represents 
the square region (0, 0) (left-bottom coordinates) to (2nr, 2nr) 
(right-top coordinates) as shown in Figure [T6] The root quadrant 
is divided into 4 equal square sub-quadrants. We show that the 
cost of the optimal 3-ANONYMITY solution for T is at most c if 
and only if the cost of the optimum policy that provides TP-aware 
sender k-anonymity to the set of users U constructed above is at 
most (4c + l)n^r^. 

Suppose that there is a solution that finds the optimum quad- 
cloak policy P of cost at most (4c + l)n^r^. We construct a sup- 
pressor / that k-anonymizes T as follows. For any 1 < i < n and 
any 1 < j < m, if the j-th location in the trajectory of Ui is masked 
by the root node of Q in the cloak sequence used to anonymize m, 
then f{ti)[j] — * and f{ti)[j] = ti[j] otherwise. Given the upper 
bound on the cost of the policy there can be at most c such locations 
in the trajectories of the users objects that are masked by the root 
node of Q in the cloak sequences used to anonymize them. There- 
fore the cost of / is at most c. Moreover, since P preserves sender 
3-anonymity, there must be 3 trajectories that are anonymized to 
the same cloak sequence and by construction these 3 trajectories 
will be anonymized the same way by the suppressor / and hence / 
is 3-anonymous. 

Next let assume / is a suppressor that provides 3-anonymity to 
T and whose cost is at most c. Using / we define a quad-cloak 
policy P for the set U of user-history objects constructed above. 
Policy P assigns a quadrant to the j-th position of user trajectory 
Ti by looking up the value of f{Ti)[j] : 

• If f{Ti)[j] — then P uses the biggest quadrant (0,0) to 
(nr, nr). 

• If f{Ti)[j] = 0, then P uses a cloak sequence with the quad- 
rant (0, 0) to (n, n) at the jth position to anonymize user Ui. 

• If fiTi)[j] = 1, then P uses a cloak sequence with the quad- 
rant (nr, 0) to (nr + n, n) at the jth position to anonymize 
user Ui . 

P is a valid policy as every cloak used masks the corresponding 
location. For any two tuples ta and tt, f(ta) = f{tb) implies that 
P uses the same cloak sequence to anonymize users-history objects 
Ua and Ub. Given that / provides 3-anonymity to T there must be 3 
users that are anonymized using the same cloak sequence hence P 
provides TP-aware 3-anonymity to the set of users U. Furthermore, 
since the cost of / is at most c, there are at most c suppressions and 
hence at most c locations in the trajectories of the users in U are 



anonymized by P to the root node of Q. The sum of the area of 
these cloaks is at most Acri^r^. The remaining locations in the 
trajectories of users of U are anonymized using cloaks of size n^. 
Since there are at most mn such locations, the total cost of P is 



mn-^ < 4cn^r^ + 2mn-^ < (4c + l)r 



□ 



A.8 Theorem H 



size < R size < R size = R size < R size < R 




Figure 17: Quad-cloak policy P 
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Figure 18: Uniform quad-cloak policy P' 

Proof. We prove this theorem by showing that any quad-cloak 
policy P that provides TP-aware sender fe-anonymity can be trans- 
formed to a uniform quad-cloak policy P' that provides the same 
TP-aware sender fc-anonymity guarantee, and for every input tra- 
jectory u, its corresponding cloak sequence by P'{u) has a cost 
that is at most I times the cost of cloak sequence by P{u). 

The policy P' is constructed as follows. For each user trajectory 
u, let the cloak sequence of P{u) be s — {qi,q2, - ■ ■ ,<li) shown 
in Figure [Tt] Let 7? be the size of the biggest cloak in s. Now 
to construct P', we "expand" each qi in s to size R as shown in 
Figure 18 More specifically, let fR{x) be the lowest ancestor of 
X oi X itself that has size R. Then P' will anonymize u using 
sequence of cloaks s' = (/fife), /flfe), • • ■ I" this 

way, the cost of cloak sequence s is I x R, which is less than I x 
cost{s). It then follows that the overall cost of P' is at most I times 
the overall cost of P. 

The constructed P' also provides TP-aware sender fc-anonymity 
since all trajectories that were anonymized to the same cloak se- 
quence by P will now be anonymized to the same cloak sequence 
by P' . Finally, since the above results apply to any quad-cloak 
policy, they apply to the optimum quad-cloak policy Popt as well. 
That is, there exists a uniform quad-cloak policy, which is not nec- 
essarily the optimum among all uniform quad-cloak policies, that 
has a cost at most / times the cost of Popt ■ Consequently, the cost 
of the optimal uniform quad-cloak policy is bounded by I times the 
cost of Popt as well. □ 

A.9 Theorem ID 

Proof. Let U he a set of trajectories. Lemma [3] shows that 
Traj-anon computes optimum uniform quad-tree policy for anon- 
ymizing U. According to Theorem |2] this optimum solution is l- 
approximation of the optimum quad-tree policy for anonymizing 
U. Therefore, Traj-anon computes the /-approximation solution 
to the problem of optimum TP-aware sender k-anonymity using 
quad cloaks. □ 



A.IO Theorem 1 

Proof. We prove this by reducing the decision version of opti- 
mum policy-aware snapshot k-anonymization with circular cloaks 
to the decision version optimum offline TP-aware k-anonymization 
with circular cloaks. First we briefly describe the problem of policy- 
aware snapshot k-anonymization with circular cloaks. 

Let D he an instance of location database with schema S = 
{userid, locx, locy} and SC he a set of points in 2-dimensional 
space. A snapshot policy with circular cloaks is defined as a deter- 
ministic function that maps locations in D to circular cloaks, each 
centered at some point from SC, with no restriction on radius. The 
cost of a snapshot policy with circular cloaks is computed as: 

Costs{P,D) = ^Cost{P{D,l)) 

leD 

where the cost of the cloak P{D, I) is the area of circular cloak. 

Definition 9 (Snapshot k-anonymity with circular cloaks). 

Given an instance D of location database and SR be the set of 
points in 2-dimensional space. Is there a snapshot policy P with 
circular cloaks that provides policy-aware sender k-anonymity and 
whose Costs{P,D) < C. 

We reduce an instance / of the above problem to an instance I' 
of the optimum offline TP-aware sender k-anonymity with circular 
cloaks. For each tuple t £ D, we create an user-history object 
u with trajectory of length 1 and set u.useridQ = t.userid and 
u.locationil) — {t. locx, t. locy). Let the resulting set of user- 
history objects be U. We create an instance I' of optimum offline 
TP-aware k-anonymization with circular cloaks using U and SR. 
We prove that there is snapshot policy P with circular cloaks that 
provides policy-aware snapshot sender k-anonymity w.r.t. D and 
Costs{P, D) < C, if and only if there is an anonymization policy 
Pt that provides TP-aware sender k-anonymity solution w.r.t. U 
and Cost{Pt,U) < C. 

Let Pt he an offline policy that uses circular cloaks and provides 
TP-aware sender k-anonymity to the set U of user-history objects 
(constructed above). Let the cost of Pt he Cost(Pt,U) < U. 
Since the user-objects in U are of length 1, the bundles obtained 
with Pt are also of length 1. We use Pt to obtain a snapshot 
policy Ps for D as follows. For each tuple t £ D, we define 
Ps{D, {t. locx, t. locy)) = b.cloakil) where b — Pt{U,u) for the 
user-history object u such that u.useridQ — t.userid. Since Pt 
provides TP-aware sender k-anonymity, there exists at least k user- 
history objects that are anonymized to a bundle b. Therefore the 
policy Ps as defined above also anonymizes at least k locations 
to the circular cloak in bundle b. Hence Ps provides policy-aware 
snapshot k-anonymity. Moreover, since Cost{Pt,U) < C and 
Cost{Pt, U) = Costs{Ps,D), therefore Costs{Ps, D) < C. 

Suppose there exists an snapshot policy Ps with circular cloaks 
that provides policy-aware sender k-anonymity to D and whose 
cost Costs{Ps, D) < C. We use Ps to obtain a policy Pt as 
follows. For every user-history object u £ U corresponding to 
the tuple t £ D such that u.useridQ — t.userid, we define 
Pt{U, u) — b where 6 is a bundle of length 1 and b.cloak{l) = 
Ps{D, {t. locx, t. locy)). Since Ps provides policy-aware snapshot 
sender k-anonymity, there exists at least k locations that are aonymized 
to the same cloak. Therefore there are at least k user-history ob- 
jects that are anonymized to the same bundle and Pt is TP-aware 
sender k-anonymous. Moreover, since the Costs {Ps ,D)<C, and 
CostiPt, U) = Costs{Ps,D), therefore Cost{Pt, U)<C. □ 



